Polynomial-complexity, Low-delay Scheduling for 
SCFDMA-based Wireless Uplink Networks 

(Technical Report) 

Shreeshankar Bodas, Bilal Sadiq 
Qualcomm, Inc., Bridgewater, NJ 08807 



o 

1—5 



C/2 



> 
(N 



o 



X 
J3 



Abstract — Uplink scheduling/resource allocation under the 
single-carrier FDMA constraint is investigated, taking into ac- 
count the queuing dynamics at the transmitters. Under the single- 
carrier constraint, the problem of MaxWeight scheduling, as well 
as that of determining if a given number of packets can be 
served from all the users, are shown to be NP-complete. Finally, 
a matching-based scheduling algorithm is presented that requires 
only a polynomial number of computations per timeslot, and in 
the case of a system with large bandwidth and user population, 
provably provides a good delay (small-queue) performance, even 
under the single-carrier constraint. 

In summary, the results in first part of the paper support the 
recent push to remove SCFDMA from the Standards, whereas 
those in the second part present a way of working around the 
single-carrier constraint if it remains in the Standards. 

Index Terms — Uplink scheduling, single-carrier FDMA, Batch- 
and-allocate 

I. Introduction 

In the recent years, we have witnessed an explosion in the 
numbers and capabilities of hand-held wireless communication 
devices, and consequently their data consumption. Real-time, 
i.e., delay-constrained data traffic (voice/video/gaming/...) 
constitutes a significant fraction of the overall over-the-air 
data demand. The demand for high-quality data, and in large 
quantities, is ever-growing, but the wireless resources are not 
growing nearly as fast. It is therefore important to design 
efficient methods of sharing the resources across multiple users 
in order to guarantee a good quaUty of service. In this paper, 
we focus on the problem of resource allocation on the uplink 
(user to base-station) of wireless networks. 

The 3GPP LTE (Long-Term Evolution) standard has cho- 
sen the single-carrier frequency division multiple access 
(SCFDMA) technology as the uplink multiple access tech- 
nology HI. The SCFDMA can be thought of as a special 
case of the orthogonal frequency division multiple access 
(OFDMA) technology used for the downlink of 3GPP LTE. 
In OFDMA, the available bandwidth at the base-station is 
partitioned into a number of orthogonal frequency sub-bands, 
and a given user can be allocated any subset of the frequency 
sub-bands for his/her downlink traffic under the condition that 
a given frequency sub-band can be allocated at most one 
user In SCFDMA, there is an additional constraint that a 
given user can be allocated only consecutive frequency sub- 
bands. For example, consider a system with 2 users x, y and 3 



frequency sub-bands /i, /a, /a- Then {x, fi), (x, /2), (y, fs) is 
a valid SCFDMA allocation, while (a;, /i), (x, /a), (y, /2) is 
not. We refer to this additional constraint as the single-carrier 
constraint. The main reason for the choice of SCFDMA for 
the uplink is that it results in a lower PAPR (peak-to-average 
power ratio) than OFDMA. 

In this paper, we show that the single-carrier constraint 
alone is enough to make certain scheduling problems hard 
(formally, NP-complete). The classic MaxWeight scheduler ^ 
is throughput-optimal for the uplink network under very mild 
assumptions on the arrival and channel processes (see 13]), 
but selecting a weight-maximizing schedule is NP-complete 
(Theorem|2]i. Another natural, myopic, "greedy" scheduler for 
the scheduling problem described in Section|llI]operates as fol- 
lows: given a queue-length vector and a matrix of the rates at 
which the frequency sub-bands can serve the individual user- 
queues, does there exist an allocation that serves Xi packets 
from the user-queue Qj l This scheduler is interesting because 
by choosing appropriate values of .t^s in every scheduling 
period, the per-user queues can be kept small. For example, 
the values of Xi can be chosen to equaUze the queue-lengths 
after service. For the downlink scheduling problem, in absence 
of the single-carrier constraint, this scheduler is shown to 
have good delay properties EJ; but under the single-carrier 
constraint, implementing it requires solving an NP-complete 
problem (Theorem [T). 

In the light of these negative results, we focus on a simple, 
i.i.d. arrival and channel model, and design an algorithm called 
Batch-and-allocate (BA) scheduler as the main contribution 
of this paper. This scheduler results in a good delay (small- 
queue) performance for the system, and can be implemented in 
polynomial number of computations per timeslot, even under 
the single-carrier constraint. 

The qualitative messages from the paper are: (i) The single- 
carrier constraint, while attractive from a power amplifier point 
of view, severely restricts the class of possible scheduling 
policies. There has been a recent push to remove it from 
the standards (e.g., clustered SCFDMA ||5], E) and this 
paper can be seen as an argument in its favor (ii) Although 
the uplink scheduling problem is intractable under the single 
carrier constraint, we can guarantee a good quality of service 
for "regular" arrival and channel processes, ;/ the system has 
a large number of users and proportionally large bandwidth. 
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II. Related Work 

Scheduling and resource allocation for the wireless uplink 
network is a well-investigated problem. Researchers have 
studied this problem from the point of view of maximizing 
a system-wide utility function Q, [S], ||9l, orderwise delay- 
optimal scheduling flOl , successive interference cancellation 
to allow for simultaneous transmissions from users llTTI . and 
so on. A majority of the previous work on the problem either 
does not consider the single-carrier constraint, or allows for 
fractional server (i.e., frequency sub-band) allocation, thus 
circumventing the inherently discrete nature of the allocation 
problem. In wireless uplink systems where frequency sub- 
bands are grouped together, the fractional server allocation 
is a reasonable assumption. A recurring theme in the prior 
work is to initially ignore the single-carrier constraint, come up 
with an allocation of the frequency sub-bands to the users that 
optimizes a certain objective, and then use heuristics to modify 
that allocation to incorporate the single-carrier constraint. This 
approach usually leads to a loss of performance. In contrast, 
in this paper, we strictly adhere to the single-carrier constraint 
even in the algorithm design part, and do not perform any 
fractional server allocations. We present an algorithm that 
is designed with the single-carrier constraint in mind, and 
which yields a good small-buffer performance under a variety 
of changes to the basic system model. To the best of our 
knowledge, this is the first characterization of the small-queue 
performance of the uplink network in the large- system limit. 
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as shown in Figure [T] 
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III. System Model 

We consider a discrete-time queuing system with 
n queues and n servers. 
Here the n queues 
represent the packet 
queues at the n uplink 
transmitters, and the n 
servers represent the 
n orthogonal uplink 
frequency sub-bands. 
The queues can store 
any number of packets 
until they are served, 
so that there are 
no dropped packets. 
Table H] summarizes the 
notation used throughout this paper 

Arrival and channel processes: We assume that the arrivals 
to the queues and the channel realizations are i.i.d across 
queues, servers, and timeslots. More precisely, 

1) The number of arrivals to Qi at the beginning of 
timeslot t are i.i.d. across timeslots and queues, and obey 
P(A,(tj = to) = p,„ for < m < M, Pi > for all i, 

2) The number of packets that the server Sj can serve 
from Qi in timeslot t are i.i.d across queues, servers and 
timeslots, and obey P(Xj^ (t) = k) ^ qk for < k < K, 
qi>0 for all i, and J2i=o 9* = 1- 



Fig. L System Model 



3) There exists a e (0, 1) such that ^ pi 

i=0 

We make the assumption Pi > for all < i < M only to 
avoid trivialities; our results or proof techniques are in no way 
dependent upon this assumption. We also assume that M > 
K, since otherwise, allocating just one server (with highest 
supported rate K) is enough to serve all the new arrivals to 
a queue in a given timeslot, and the single-carrier constraint 
in the problem can be easily circumvented by the matching- 
based algorithms for the downlink, such as those in lfT2l . Our 
objective is to define a service policy, quantified by the random 
variables Yij{t) S {0,1} for i,j G [n] and for all t, where 
Yij{t) = 1 if the server Sj serves the queue Qi in timeslot t, 
and otherwise. The random variables Yij (t) are allowed to 
depend upon the entire past of the system and the arrivals and 
channel realizations in the (current) timeslot t, but are required 
to satisfy the following conditions: 

1) Y:"=iYvit) < 1 for all i,j,t. 

2) If Ytrit) = Yis{t) = 1 for some 1 < r < s < n, then 
Yij{t) = 1 for all r < j < s, a\\ i e [n]. 

The first condition implies that a given server can serve at 
most one queue in any timeslot. The second condition models 
the single-carrier constraint. The queues evolve according to 



Q^{t) = {Q^{t - 1) + Mt) -J2X^J{m,{t)'^ 



(1) 



Our objective is to define a scheduling policy that, for every 
integer b > 0, results in a strictly positive value of 



I{b) liminf — logP 

n— )-oo Ji 



max Qi{t) > b 

l<i<n 



where F(-) refers to the stationary distribution of the queue- 
length process. The function I(-) is called the rate-function 
in large deviations theory ifTSI . In order to guarantee a good 
small-queue performance, our true objective is to minimize 
the "overflow" probability, i.e., the probability of the event 
{maxi<i<„ Qi{t) > b}. In real systems with a large number 
of users and proportionally large bandwidth, the rate-function 
maximization is a useful and reasonable surrogate for this 
objective. If I{b) > 0, then the probability of the overflow 
event rapidly diminishes to with the system-size. Hence in 
this paper, we focus on policies that result in a strictly positive 



Q 
S 

Q{t) 
X,,{t) 

[n] 
a+ 
\A\ 
R+ 



The set of n queues {Qi, . . . , Qn} 
The set of n servers {S\, . . . , Sn} 
The length of Qi at the end of timeslot i 
max{Qi(f) : 1 < i < n} 

The number of packets that the server Sj can potentially serve 
from Qi in timeslot t 

The number of arrivals to Qi at the beginning of timeslot t 
The set {i : 1 < i < n} 
max(a, 0) 

The cardinality of set A 

The set [0, oo) of nonnegative real numbers 

The probability simplex in R'' 

TABLE 1 
Notation 
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value of the rate-function. The assumption |3] is a necessary 
condition for the rate function to be nonzero, even without 
the single-carrier constraint llT4l . Our main contribution is 
an algorithm that yields a positive value of the rate-function 
under this assumption. 

Note: In the rest of the paper, for simplifying notation, 
we make statements like "allocate n/2 servers to a queue." 
What we actually mean is the integer part (or floor) of 
the corresponding fraction. We never make fractional server 
allocations. We are interested in the large deviations results 
(n large). In this regime, the rounding has no effect on the 
analysis. We do not discuss this issue further in this paper. 

IV. Computational Hardness 

In this section, we establish that in the presence of the 
single-carrier constraint, certain (otherwise simple and in- 
teresting) scheduling policies are NP-complete. We use a 
construction almost identical to the one from ifTSl . In lITSl . 
the authors establish the NP-hardness of the single-carrier 
scheduling problem in the context of proportionally fair (PF) 
scheduling. Their reduction can be modified to suit in our 
case. The reasons that we provide a detailed account here, as 
opposed to merely citing their result, are: (i) their result is 
not directly applicable in our case: it is concerned with PF 
scheduling, and (ii) their construction is cryptic to the authors 
of this paper, with a number of key proof details missing. 

In the multi-queue multi-server setup described here, a nat- 
ural, myopic way to minimize the probability that the longest 
queue exceeds a given constant h is to select, in every timeslot, 
that allocation of the servers to the queues that minimizes the 
maximum queue-length. This requires answering the question: 
can a queue Qi be allocated at least Wi units of service, 
i G [n] ? A simpler question as defined in Definition [T| is: 
can a total of W packets be drained from the queues? Our 
objective is to show that even this simpler problem is NP- 
complete under the single-carrier constraint. 

Definition 1 (Packet-draining problem (PD)): Consider 
a queue-length vector [Qi, ■ ■ ■ ,Qk] and a set of servers 
{Si, . . . , Sm}, where the server Sj can serve Xjj packets 
from the queue Qi. A finite integer W > is given. 
Determine if, under the single-carrier allocation constraint, 
there exists an allocation of the servers to the queues that 
serves a total of at least W packets. o 

Theorem 1: The packet-draining problem (PD) is NP- 
complete. 

Proof: Please see Appendix lAl ■ 
We now focus on the problem of MaxWeight scheduling 
under the single-carrier constraint. This classic scheduling al- 
gorithm was introduced in |21 and is known to be throughput- 
optimal (i.e., makes the queue-length Markov chain positive 
recurrent if there is any other algorithm that can do so) in a 
variety of situations, including under the single-carrier con- 
straint, even under more general (e.g., correlated) arrival and 
channel processes f3l. But as is established next, implementing 
it is computationally intractable unless P=NP. 



Definition 2 (MaxWeight problem (PM)): Consider a set of 
queues [Qi, . . . , Qk] with lengths [Li, . . . , Lk], and a set of 
servers {Si, . . . , Sm}, where the server Sj can serve Xij 
packets from the queue Qi. A finite integer > is given. 
Let Yij = 1 if the server Sj is allocated to Qi , and otherwise. 
Determine if, under the single-carrier allocation constraint, 
there exists an allocation of the servers to the queues with 

In the (PM) problem, we refer to the quantity 
^^^-^ X^Jli LiXijYij as the weight of the allocation. 

Theorem 2: The MaxWeight problem (PM) is NP- 
complete. 

Proof: Please see Appendix iBl ■ 

V. The Batch-and-allocate Algorithm 

The computational hardness results in Section |IV]imply that 
unless P=NP, there does not exist a computationally efficient 
scheduling algorithm that guarantees throughput optimahty 
under general arrival and channel conditions. On the other 
hand, the user-experienced quality of service is crucially 
dependent upon a good delay performance. Hence we focus 
on designing a computationally tractable algorithm that gives 
a good delay performance under a restricted class of arrival 
and channel processes, namely, i.i.d. arrivals and channels 
with a bounded support, as specified in Section [nil We call 
this algorithm the Batch-and-allocate (BA) algorithm. We first 
define the Selective-allocate (SA) algorithm that is used as a 
"black-box" in the BA algorithm. 

Selective-allocate (SA) algorithm: 
Input: 

1) An integer fc > 1. 

2) A bipartite graph G{U U V,S) with |V| > k\U\. Let 
U = {ui, . . .,Ux} and V = {vi, . . .,Vy}. 

Steps: 

1) Partition the nodes in the set {vi, . . . , Vkx} into disjoint 
subsets Vi, . . . ,Vx such that Vi = {w(i-i)fc+i, ■ • ■ , Vi^}. 
Let V :={Vi,...,Vj. 

2) Construct a new graph H{U U V',£') where an edge 
{ui,Vj) is present in £' if the node Ui is connected to 
every node in the set Vj in the original graph G. 

3) Find a largest cardinality matching A4 in the graph H, 
breaking ties arbitrarily. 

Output: The matching A4. o 

The SA algorithm groups the nodes in the set V into sets 
of size k each, and matches each such group Vi to that node 
Uj S U that is connected to each node in the group Vi. One 
can think of each node in the set U as a queue, each node in 
the set V as a server, and the presence of an edge signifies 
that the server can serve the given queue. An example of the 
SA algorithm for the case fc = 2 is shown in Figure |2] Here 
the solid edges in the graph H represent the matching . We 
write M — SA{k, G) for the output of the SA algorithm. 

Batch-and-allocate (BA) algorithm: 
Input: 
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Define n" = n' — (dc+i +dc+2 
set of servers % satisfying 



driig ) . Define the 
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The constructed graph H 
with the matching M (solid Unes) 



The given graph G 

Fig. 2. SA algorithm - example 



1) The vector of queue-lengths, Qi{t — 1), . . . , (5„(t — 1). 

2) The vector of arrivals, Ai{t), . . . , A„{t). 

3) The channel realizations, Xij{t) for i,j G [n]. 

Steps: 

1) Calculate Q{t - 1) max Q,{t - 1). If < K 

l<i<7i 

for some pair then set Xij{t) = for that pair 

and use this value of Xij{t) throughout the rest of the 
algorithm. 

2) For 1 < 7' < Too, define 

Vr := {ie[n]:Q{t-l) + {r-l)K + l< 
QS - I) + A^it) < Q{t - 1) + rK} 

to be the set of queue-indices i such that the queue Qi 
needs to be allocated exactly r servers to ensure Qi (t) < 
Q{t- 1). Let 

Va = {ie [n] : Q,{t - 1) + = Q{t - 1)} 

be the set of queue-indices i such that after arrivals, 
the queue-length of Qi is the maximum queue-length at 
the end of the previous timeslot. We allocate servers to 
only some of the queues in the sets 2?i,0 < i < toq. 
Let d, = \V,\. 

3) Let < a < toq + 1 be the smallest integer such 
that X]"=a ^'^i — Here a = toq + 1 implies that 
the previous summation is vacuous (equal to 0), i.e., 
TOo^mo > Let n' := n- J2"^a ^^^i- 

4) Case a < mo ; Let c G {a, a + 1, . . . , toq} be the largest 
integer such that dc + dc+i + • • • + 'imo > n' /2. For each 
i G {c + 1, c + 2, . . . , 7Tio} define the set of servers % 
satisfying 

\%\ + l)d, + 



2(too — a + 1) 



For each i G {a, a + 1, . . . ,c — 1}, define the set of 
servers % satisfying 

|7I| = idi 



\Tr\ 



(c + l)n" + c{d, - n") + 



2{mo — a + 1) 

Ensure that the servers in % are consecutively numbered 
for all i G {a, a + 1, . . . , toq}. 

Case a = toq + 1 : Define the set of servers Tm,, = S, 
the set of all the servers. 
5) Allocating servers to queues: 
Case a < mo : 

a) For every i G {c + 1, c + 2, . . . , toq}, let d be the 
restriction of the graph G{QL}S,£) where the set 
of queues is restricted to indices in T>i , and the set 
of servers to %■ Compute A^; = SA{i + l,Gi). 
For every i G {a, a + 1, . . . , c — 1}, let be the 
restriction of the graph G{QUS,£) where the set 
of queues is restricted to indices in T>i, and the 
set of servers to Ti. Compute A4i = 5*^4(1, Gi). If 
i = 0, compute Mo = SA{1, Go)- 

b) Let 2?^ C Vc be any subset satisfying ~ n", 
and 2?" = Vc\V[. Let C 7^ be a subset 
satisfying |7^'| = (c+l)n"+n7(4(TOo-a+l)) and 
7^" = Tc\Tc- Ensure that the servers in 7^' , T" are 
consecutively numbered. Let G'^ (resp. G") be the 
restriction of G where the set of queues is restricted 
to indices in (resp. G"), and the set of servers 
to (resp. T^). Compute M', = SA{c+l,G'J 
and M'^ = SA{c, G'^). Let Mc = M'^ U M'^. 

For a < i < toq, allocate the servers to the queues as 
dictated by Aii : if (QxiVVy) G Aii for some queue 
Qx with X G Vi and a set of servers Wj,, then allocate 
the servers in Wj, to Qx, etc., and accordingly define 
the allocation random variables Yij (t) . 



Case 



Too 



1 : Let Gmr, be the restriction of 



2(to,o - a + 1) ■ 



the graph G(Q U S,£) where the set of queues is 
restricted to indices in 2?mo, ^nd the set of servers to 
Tma = S. Compute Mma = SA{n/mo, Gma)- Allocate 
the servers to the queues as dictated by A4mo- 
6) Update the queue-lengths to account for service as per 
Equation ([T]l- 
Output: 

1) The allocations, Yij{t) for i,j G [n]. 

2) The final queue-lengths, Qiit). 

Informally, the algorithm tries to reduce the queue-length 
of each of the queues after arrivals, to the maximum queue- 
length before arrivals. In order to limit the number of search 
possibilities, the algorithm only considers channels that have 
the maximum rate = K. The algorithm groups the queues into 
disjoint sets such that the queues in each group require the 
same number of servers to attain a queue-length less that 
or equal to the maximum queue-length at the end of the 
previous timeslot. It then determines the number of servers to 
allocate to the queues in each group, which is somewhat more 
than the bare-minimum required number of servers to reduce 
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each queue-length to the desired value. It assigns subsets of 
consecutively-numbered servers to each group of queues. The 
SA algorithm is used to make assignment decisions within 
each set of queues and the respective group of servers. 

Some features of the algorithm are: (i) This is a real-time 
algorithm; it does not need to know the statistical system 
parameters (e.g., the probabilities) in order to be implemented, 
(ii) This algorithm results in a strictly positive value of the rate 
function (Theorem|3]l. (iii) This algorithm can be implemented 
in polynomial time (Theorem |4|l. 

In order to limit complexity, the algorithm treats the smaller 
channel-rates as 0. In spite of this "wastage," the algorithm 
gives a good small-queue performance (Theorem |3). So the 
message is: for good delay performance, even under the single- 
carrier constraint, it is enough to focus on the highest-rate 
channels alone. We first establish an important property of 
the SA algorithm. 

Lemma 1: Consider a graph G{U U V,£) with |V| = r > 
Suppose that for any pair of nodes u G U ,v G V, the 
edge {u,v) is present in £ with probability q, independently 
of all other random variables. Let Ai = SA{k,G). Then for 
r large enough, ¥{\M\ < \U\) < 3[r/A:J(l - qk)l-r/k\^ 

Proof: Please see Appendix ICl ■ 

Note that the RHS of the above expression tends to as 
r — > cxo for a fixed k. Now our objective is to show that under 
the BA algorithm, in every timeslot, the probability that the 
maximum queue-length in the system increases is "small" for 
n large. Define rno \M/K~\. 

Lemma 2: Fix any e 6 (0, a/(2Mmo))- Define the set Be 
of probability measures "near" the distribution of the arrival 
process, as 

B, {[a;o,...,XAf] £ Am+i : \xi - Pt\ <eVO<i< M}. 

M 

For e £ Rj_, define rfe) := inf > Wilog— . 

Here r : R+ R+ U {oo}. Fix any p G (0, 1). Then under 
the BA algorithm, for n large enough, for any timeslot t, 



p(Q(f + l) >Q(t)) 



4mo(mo + 1) 



(l-g™")U™or™o+i)J. 



Proof: Please see Appendix iP] ■ 
We now show that for n large, the probability that in a 
constant number of timeslots, the maximum queue-length in 

the system decreases is at least 1/2. 

Lemma 3: Under the BA algorithm, for n large, there exists 
a constant integer ko such that 

P + fco) < Q{t) - 1 \Q{t) > o) > ^. 

Further, = [^] is a valid choice. 

Proof: Please see Appendix [E] ■ 
As a result of Lemmas |2] and [3] the maximum queue-length 
in the system has the following behavior: 



1) In a given timeslot, it increases with probability that is 
exponentially small in n, and if it increases, the amount 
of increase is no more than M, which is a constant 
independent of n. 

2) Over a constant number of timeslots, it decreases with 
at least a constant (= 1/2) probability. 

Thus, it is reasonable to expect that the stationary distribution 
of the maximum queue-length is strongly concentrated near 0, 
which is formally established next. 

Theorem 3: Under the BA algorithm, the stationary distri- 
bution of the maximum queue-length in the system obeys 



lim inf — log P ( max Qi {t) > b 

ri— >oo n \ l<i<n 



■log- 



1 



> 0. 



Proof: Please see Appendix |F] ■ 

Thus the proposed BA algorithm results in a strictly positive 
value of the rate function. Next we analyze its complexity. 

Theorem 4: The BA algorithm can be implemented in 
0(71^-^) computations per timeslot. 

Proof: Please see Appendix iGl ■ 

We conclude this section by showing that there is a finite 
upper bound on the rate-function under any algorithm. The 
purpose is to establish that in the multi-queue multi-server 
setup considered in this paper, the probability of the overflow 
event decays like at best; not like e^" or e^"'°sn^ etc. 

Theorem 5: Fix 9 e {0,M/K - 1). Define Ce = {x e 
Am+1 : EZo > ^'(1 + ^)} and 

M 



yeAA/+i\Ca 



1=0 



Then under any algorithm for allocating servers to the queues, 

,. . „ -1 . , 



n— >cxD Ji 



l<i<n 





'b+l 
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Proof: Please see Appendix iHl ■ 
Thus there is at most a constant-factor gap from optimality 
for the rate function under the BA algorithm. 

VI. Extensions 

The BA algorithm presented in Section |V] can be easily 
extended to a variety of cases of interest. 

(i) Unequal number of queues and servers: This case 
is of practical importance, because in typical uplink wireless 
systems, the number of active users is smaller than the number 
of orthogonal frequency sub-bands. The BA algorithm can 
be easily modified to utilize this "extra" service capacity, as 
follows. Suppose we have a system with n users and rn 
frequency sub-bands (servers) for some r > 1. We refer to r 
as the over-provision factor In the step|4]of the BA algorithm, 
we give r times as many servers to each group of queues T>i 
compared to the case of n queues and n servers. As a result, 
the rate-function lower bound of Theorem |3] scales up by a 
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factor of r. Formally, under the BA algorithm, the stationary 
distribution of the maximum queue-length in the system obeys 



50 users, effective load - 'i: 



liminf — logP max Qi{t) > b 

n— *oo n \l<i<n 

r(b+l) . / 



■ log ■ 



> 0. 



4mo(mo + 1) " ° 1 - q'j^' 

We omit the proof details. 

(ii) Different priorities to queues: The BA algorithm can 
be used in the case where the queues have different priorities. 
In this set up, we are interested in minimizing the probability 
of the event {maxjg[„] aiQi{t) > 6} where < Omin < < 
1 are given numbers. The BA algorithm instead operates on 
the "effective" queue-lengths, namely, aiQi{t), to yield rate- 
function results similar to Theorem |3] 

VII. Simulation Results 

We now analyze the performance of the proposed Batch- 
and-allocate (BA) algorithm through simulations. The goals 
are threefold: (i) The rate-function results for the BA algorithm 
are asymptotic, i.e., as the number of users (n) and the number 
of sub-bands tend to infinity. We want to understand how large 
n needs to be, to get a good small-buffer performance, (ii) We 
want to understand the (good) impact of having more fre- 
quency sub-bands than the number of users, which is typically 
the case in today's wireless uplink systems, (iii) We want to 
compare the BA algorithm's performance to an OFDMA-based 
greedy algorithm in [|16J that operates in the absence of the 
single-carrier constraint, in order to quantify the performance 
loss due to the single-carrier constraint. In the simulations, we 
run the OFDMA-based algorithm with as many servers as the 
users (i.e., over-provision factor, r = 1). 

For simulation purpose, we arbitrarily assume an arrival pro- 
cess distribution of the form (x + l)e^^ on a bounded support 
{0,1,..., 5}, normalized. We assume that the channel-rates 
are either or 2 packets per timeslot. Thus Af = 5 and K = 2 
in the paper's notation. We refer to the quantity Pi 
as the effective load. In our case, the effective load is about 
62%. We vary the channel ON probability, q, from 0.7 to 
0.9, and plot the empirical probability of buffer overflow v/s 
buffer-size, averaged over 10^ timeslots. 

The results are presented in Figure |3] As we can see, the 
presence of the single-carrier constraint significantly degrades 
the small-buffer performance: the buffer overflow probabilities 
in the absence of the single carrier constraint are substan- 
tially lower than otherwise. We see that the buffer overflow 
probability decreases with increasing system-size, as expected: 
the overflow probability is exponentially small in the system- 
size. We also see that changing the over-provisioning factor 
from 1.5 to 2 provides some performance boost. This confirms 
that the BA algorithm can seamlessly utilize more frequency 
sub-bands. Most interestingly, the asymptotic rate-function 
results for the BA algorithm already manifest themselves 
to give a good small-buffer performance at n = 50. We 
have seen a comparable performance for the case n — AO. 
Thus, the proposed BA algorithm yields a good small-queue 
performance at realistic system-sizes. 




Maxinium qiieiie-lengtli (b) 



Fig. 3. Performance of the BA algorithm 

VIII. Conclusions 

We considered the problem of user-scheduling in the wire- 
less uplink networks. The distinguishing feature that makes 
this problem harder than the OFDM downlink scheduling 
problem is the presence of the single-carrier constraint. We 
showed that under the single-carrier constraint, the Max Weight 
problem and the packet-draining problem are NP-complete. 
We presented the Batch-and-allocate algorithm that has poly- 
nomial complexity per timeslot, and a good small-queue per- 
formance for a class of bounded arrival and channel processes. 
The algorithm is robust to changes in the system-model. The 
results were validated through analysis and simulations. 
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Appendix A 
Proof of Theorem[T] 

The problem (PD) clearly belongs to the class NP: a 
certificate is an allocation of the servers to the queues that 
serves a total of at least W packets from the queues. In 
order to show that it is NP-complete, we use a reduction to 
the Hamiltonian path problem, which is NP-complete ( IfTTl . 
Ch. 8). The Hamiltonian path problem asks: given a directed 
graph G, does it contain a (directed) path, starting and ending 
at any node, that visits every node exactly once? 
Reduction: 

Given a directed graph G{V,S) with |V| = n, we construct a 
directed bipartite graph G'{Vi U Vr, £') as follows: for every 
node Vi G V, define two nodes via G Vf and Vr,i G Vr- 
Connect vi,i to Vr,i via a directed edge. If a directed edge 
{vi,Vj) exists in £, then introduce a directed edge from Vr.i 
to Vij. That is, all the incoming edges to Vi are connected 
to W£ i and all the outgoing edges are connected to u,. i. One 
can easily show that the graph G has a Hamiltonian cycle iff 
G' has; we omit the proof. We call the graph G' the bipartite 
version of G. 

Define T = 2n{n + l)(n + 2). Consider an instance of the 
problem (PD) with 2n queues, each with 2nT + (2n— l){n + 
2) packets, and 2jiT + {2n — l)(n + 2) servers. The servers 
are grouped in 4n — 1 sets: 2n sets of T servers each, and 
2n — 1 sets of 7i + 2 servers each. Let the sets of T servers 
be called Ai i, . . . , Ai ni ^r,ij • • ■ , ^r.m and the sets of 7i + 2 
servers be called Bi, . . . , B2n-i- The servers within a set are 
consecutively indexed. We use the symbol C < D lo denote 
that the servers in the set C have lower indices than those in 
the set D. We order the servers such that 

Ai,^i <Bi< Ars < B2< Ae^2 < B3 < A^^ 

< B4 < Ai,-i < ■ ■ < B2n-1 < Ar,n- 

Let the set of 2n queues be Qej, . . . , Qe.n, Qr.i, ■ ■ ■ , Qr,n- 
Let X{Qi^i, Sj) denote the number of packets that the server 
Sj can serve from the queue Qi,i, and similarly for Qr,i- 

Fix a queue Qi^i. Note that a server in the set B^ has the 
index xT + {x - + 2) + j for j (E[n + 2]. 



1) For every x G [n] and for each server Sj G Ai,x, define 
X{Qe,i, Sj) = 1. For every x G [n] and for each server 
Sj G Ar,x, define X{Qij,Sj) = 0. 

2) Fix X G [2n — 1], x odd. For a server in 
Bx with index xT + {x — l)(n + 2) + j, define 
X{Qe.i, SxT+(x-i){n+2)+3) = j + 1 if j = j, and 
otherwise. 

3) Let Wr^gi , Vr.g2 , • • ■ be the nodes that have an out- 
going edge to the node Vi,i, with gi > 52 > 
.... Fix X G [2n — 1], x even, and a server in 
Bx with index xT + {x - l){n + 2) + gj. De- 
fine X{Qij, SxT+{x-i){n+2)+gi) = u + 1 - gi and 
X{Qij, SxT+{x-i){n+2)+gj ) = ^ + 1 - (.9j-i - gj) for 
j > 1. Define X {Qi^„ SxT+{x-i){n+2)+j) = for all 
other values of j G [n + 2] . 

Perform the same construction for a queue Qr.i with Ar,x 
replacing A^^x and vice-versa in step[T] and the words "even" 
and "odd" replacing each other in steps |2] and |3] Define the 
number W = 2nT + {2n - l){n + 2). 

Here the steps [T] and [2] are generic and apply to any graph 
G, while step |3] is dependent upon the graph structure. We 
now establish some basic properties of the possible server 
allocations for the above construction under the single carrier 
constraint. 

Property 1: If a queue, say Qi^i is not allocated any server 
from the set Ux£in]Ai,x, then the maximum total number of 
packets that can be served from the queues is less than 2nT. 
Proof: A server in a set Ai,x can serve at most 1 packet. 
Further, that packet must be from a queue labeled Qe,x'- Thus 
the maximum number of packets served by all the servers in 
the set Ag^x is T. Suppose that for all x G [n], the servers in 
the sets Ae,x serve 1 packet each. Since by hypothesis at most 
n — 1 queues in {Qe,i, • • ■ , Qe.n} can be allocated a server 
in Ag^x, by the pigeonhole principle, at least one queue Qij 
must be served by servers in Ai^x and A^^x+r for some x, 
some r > 1. Consequently, as a result of the single carrier 
constraint, all the servers in Ar,x must be allocated to Qe,j, 
each serving packets from Q( j . Thus the maximum number 
of packets that can be served by the servers in Ar,. is (ri — 1)T. 
The total number of packets that can be served by the servers 
in Ae^. is at most nT. 

For X G [2n — 1], the maximum number of packets that 
can be served by a server in Bx is n + 1. The total number 
of servers in any set Bx is n + 2. Thus the total number 
of packets that can be served by all the servers in UxBx is 
{2n-l){n + l){n + 2). Since T = 2n{n + l){n + 2), we have 
{n - l)T + nT+ (2n - l){n + l){n + 2) < 2nT. «|k 

By symmetry, the above property also holds for a queue 
Qr,i- Thus, any allocation that serves W = 2nT + {2n — 
l)(n + 2) packets must allocate server(s) from Uxeln]A-i,x 
(resp. Uxeln]^r,x) to each queue Qg^i (resp. Qr,i)- 

Property 2: Exactly one of the following statements is true: 
1) There exists a permutation a (resp. tt) of [n] such that 
all the servers in Ai,i (resp. Ar,i) are allocated to Qe.at 
(resp. Qr,7Ti)- 
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2) The allocation serves a total of W' < 2nT packets from 
all the queues. 

Proof: If statement [T] holds, then evidently a total of 2nT 
or more packets are served, so statement |2] cannot hold. 
If statement [1] does not hold, then WLG suppose a queue 
Qg^i is allocated servers from Ae ,j. and A^ r^j^^ for some x, 
some r > 1. As before, the servers in A^^^- are allocated to 
i, serving packets. Hence, as established in the proof of 
Property 1, the allocation serves a total of W' < 2nT packets, 
thus statement |2] holds. ^ 
Thus, if an allocation serves 2nT or more packets, then for 
every x G [2n — 1] , the set of servers in Bx serve at most 2 
queues, and the queues (if two) are of the form {Qe,i,Qr.j)- 

Property 3: Let an allocation serve a total of at least 2nT 
packets from all the queues. If all the servers in a set are 
allocated to the same queue, say Qt.ii then the total number 
of packets served by the servers in B^ is at most n + 1. 
Proof: If X is odd, then exactly 1 server from B^. can serve a 
nonzero number of packets from Qe,i, and that number equals 
i + 1. If x is even, then the server B^ can serve at most 
n + 1 — l<n + l packets from Qi,i. 4|k 
An allocation of servers to the queues is said to be normal 
if there exists a permutation a (resp. tt) of [n] such that all the 
servers in Ai,i (resp. Ar.i) are allocated to Qi.^i (resp. Qr.Tr;)- 

Property 4: Fix x € [n], x odd. Under a normal allocation, 
let the servers in B^ serve two queues {Qe,i,Qr,j)- If there 
exists a directed edge Vr.j) G £', then the servers in B^ 
serve a total of at most n + 2 packets, else, serve at most n+1 
packets. 

Proof: Suppose the servers in B^ serve two queues 
{Qe,i,Qr.j) with {ve^i,Vrj) S £' ■ There is exactly one server 
St in Bx that serves Q(,i at a nonzero rate of i + 1 packets. 
If this server St is not allocated to Qe,i, then the number of 
packets served from Qr is at most n + 1 by Property 3: the 
number of packets served from Qrj cannot be more than if 
all the servers in B^ are allocated to Qrj- 

If St is allocated to Qe.i, then because x is odd and the 
allocation is normal, the servers in B^ with indices less than 
t are allocated to St- The maximum number of packets that 
can be served from Qr.j by allocating to it all the servers in 
Bx with indices higher than i is n — i + 1, implying a total 
of n + 2 packets at most. 

If there does not exist a directed edge {vi,i, Vr.j) in £' , then, 
even after allocating St to Qi ^ and all the servers in Bx with 
indices higher than t to Qrj , the maximum number of packets 
served from Qrj is at most n — z + 1 for z > i, implying a 
total of at most n + 1 packets. 4|k 

If the allocation of servers in Bx to the queues {Qe^i, Qr.j) 
serves a total of + 2 packets, we call it a drain-maximizing 
allocation for Bx- 

A similar statement to Property 4 can be proved for Bx 
for even x, and an edge {Qr,j,Qt,i) G £' ■ We are now in 
a position to prove that a Hamiltonian path exists in G' if 
and only if there exists an allocation of servers to the queues 
that serves at least W = 2nT + {n + 2){2n - 1) packets. 



First suppose there exists an allocation that serves at least W 
packets. Then it must be normal, and for every Bx, it serves 
exactly 2 queues, one from {Qt,i, ■ - ■ ,Qi.n} and the other 
from {Qr,i, ■ - ■ ,Qr,n} 1 and the same queues Qii and Qr.j 
that are served by the adjacent servers in sets Ai^. and Ar^.- 

ThuS the queues Qr.Tn, Qf.^s, Qr.Tra, ■ • ■ , Q£,o-„, Qr,7r„ 

are served in order in consecutive server blocks. Consider the 

path Vl^a^ Wr,7ri ^ ,0-2 ^ '^r^TTa -> ' • ' ^ Wf,cr„ Wr,7r„ ■ 

This is a valid path in the graph G" (Property 4) and because 
(7, TT are permutations, it visits every node exactly once. 
Therefore it is a Hamiltonian path. 

Next suppose that there is a Hamiltonian path in G", 

WLG call it Vi^ai — ^ V^^-ki Vl,a-2 — > Vr.Tr2 ^ ' ■ ' — ^ 

ve.a„ 'Vr.-Kn- Then allocating to the queue Qi.a^ the 
servers in Ai,i, to the queue Qr.j the servers in 7r_,, and the 
drain-maximizing allocations for each Bx (which is possible 
because of Property 4), we get an allocation that serves 
exactly W — 2nT + (n + 2)(2n — 1) packets. This completes 
the reduction. Since T = 0{n?), this is a polynomial-time 
reduction. Therefore the problem (PD) is NP-complete. 

Appendix B 
Proof of Theorem[2] 

The problem (PM) clearly belongs to the class NP: a 
certificate is an allocation of the servers to the queues that 
has a weight of at least W. To show that it is NP-complete, 
we use the same reduction to the Hamiltonian path problem 
as before, we consider each queue to be of length = 1 packet, 
and ask the question whether W — 2nT + (2n — l)(n + 2) 
units of total service can be offered, which translates to a 
schedule-weight of W- We omit the details. 

Appendix C 
Proof of Lemma[T] 

Let z = \j/k\. Adding dummy nodes if necessary to the 
set U, and removing some nodes if necessary from the set V, 
we construct a graph G'iU' U V',£') where |V'| — kz and 
\U'\ = z- For a pair of nodes u' G U' and v' G V, 

1) If u' G U, then for any w' G V C V, {u',v') G £' if 
and only if (u', v') G £. 

2) If u' ^ then for any v' G V, the edge (u', v') G £' 
with probability q, independently of all other random 
variables. 

Group the nodes in the set V" as described in the SA 
algorithm, to get a bipartite graph G"(U' U V", £") where V" 
is the set of groups of nodes in V', and nodes u' <eW, v" G V" 
are connected by an edge in £" if the node u' is connected to 
every node in the group V". Thus between any pair of nodes 
in U' X V", an edge exists with probability . 

For z large enough, the graph G" has a perfect matching 
M" with probability at least 1 -3z(l -g'')^ (|[l2l. Lemma 1). 
Removing the "dummy" nodes that were added to get the set 
U' from U, we get a matching as the output of the SA 
algorithm with \U\ = \A4\- That is, a perfect matching in the 
graph G" (deterministically) yields a matching of cardinality 
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\U\ as the output of the SA algorithm. Therefore, for r large 
enough, V{\M\ < \U\) < 3[r/fcJ(l - g'=)L'-/fcJ. 

Appendix D 
Proof of Lemma|2] 

The proof proceeds in two steps: first we show that for large 
n, with high probability, a = holds in the step |3] of the BA 
algorithm. In the process, we show that the the number of 
"excess servers" n' (step [3] of the BA algorithm) is at least 
na/2 with high probability. Next, under the condition a = 
and n' > na/2, we show that the probability of {Q{t + 1) > 
Q{t)} is small. 
Step 1: 

For < « < M, let p'^ := \{k e [n] : Ak(t + 1) = i}\/n 
be the fraction of the n queues that see exactly i arrivals 
in the timeslot t + 1. Let p' = [p'q,p[, . . . ,p'j^j]. Choose 
any e G (0, a/(2A/mo)), say e = a/(4MTOo)- By Sanov's 
theorem ( ifTSl . Thm. 2.1.10), for any p € (0,1), for n large 
enough, P(p' ^ B^) < e""''^'^'. Since the set Am+i \ B, is 
compact and the function g(y) = X^fio ^'^siUi/Pi) is lower 
semicontinuous (|l3l|. Chapter 2, Exercise 2.1.22), the infimum 
in the definition of t(-) is achieved and is strictly positive 
("■■ 5(y) = 4^ y = p, p e 6e and ^(y) > for all y). Thus 
T(e) > 0, implying 

P(|p,-p,:i <e,Vze {0,1,...,M}) > l-e-"''^^^). 

Let Q{t) = m. Define the set Cr := {i G [n] : (r - 1)K + 
1 < Ai{t + 1) < rK}. Since Qi{t) < m for all i, Vr C 
U";C,. Hence, 



\Vr\ < iai + ia+ii 



\Cr, 



'^(P(r-1)K+1 + P[r-l)K+2 + ' ' ' + p'm) 



implying 



niQ mp 

^r\Vr\^^r\Vr\ 



r=l 



=0 



< 



(a) 
< 



< 




neMrriQ , 



where the step (a) holds with probability at least 1 — e^"'''"'^^-'. 
Since e < a/(2A/mo), we have X]r=o ''l^'-l < ^ na/2, or 
a = and n' > na/2 in the step |3] of the BA algorithm, with 
probability at least 1 - e""'''^''^). 
Step 2: 

We assume that a = and n' > na/2 in the step |3] of the BA 
algorithm. Consider the event £i that each of the queues in the 
set Vi are allocated at least i servers. If the event £i occurs for 
every i £ {1,2,..., mo}, then the maximum queue-length at 
the end of timeslot t + 1 is at most m. This event (£;) occurs 
if, in the server allocation step (step |5]l of the BA algorithm, 
the matching obeys \Aii\ = \Di\. 



Fix any i G {1, 2, . . . , mo}. We have \Ti\ > i\Vi\ + 
n'/(2(mo - a + 1)) > i\Vi\ + n'/{2{nio + 1)), and \%\/i > 
\V,\+n'/(2mJmo + l)) > |2?^| + na/(4mo(mo + 1)). Thus, 
from Lemma [T] 



\VA) > 1 



4mo(mo + 1) 



(1 



q™")L4"o(™o+i)J 



Hence, by the union bound, 

P(|M| - mVie [mo]) 
> 1 — Sjtiq 



4mo(mo + 1) 



(1 - g™«)L*™o("o + i)J 



Combining the results of steps 1 and 2 and once again using 
the union bound. 



P{Q{t + l) >Q{t) 



e-"""'^' + 3mo 



4mo(mo 



(1 -g™o)U"o("™o + i)J, 



completing the proof. 



Appendix E 
Proof of Lemma[3] 

Suppose at the end of timeslot t, the maximum queue-length 
is m and the number of queues at length m is x. Our objective 
is to show that at the end of timeslot < + 1, with probability 
at least 1 — e"""^ for some > 0, 

1) the maximum queue-length is at most m, and 

2) the number of queues at the maximum is at most {x — 
na/A)+. 

Since x < n, the properties [T] |2] and the union bound imply 
that with probability at least 1 — fcoe"""^, at the end of ko ~ 
[^] timeslots, the maximum queue-length is at most m — 1. 

First consider the case x = n, i.e., all the queues in the 
system are equal in length. From Lemma |2] for n large, the 
probability that Q{t + 1) > m is upper-bounded by e^"^^ for 
some 6i > 0, so the property[T]is satisfied. Next, the BA algo- 
rithm allocates to the queues in the sets Pc+i, 25c+2, ■ • • , T^mo 
one more server than is necessary to bring their length to m, 



and also for n" 



{dc+i + 4+2 H 1- dmo) queues in 



Vc- Thus, at the end of timeslot t + 1, the number of queues 
at length m is at most [n — ri'/2)+, and by the proof of 
Lemma |2] the probability of this event is at least 1 — e""^^ 
for some O2 > 0. Since n' > na/2 with probability at least 
1 — e~"*'* for some ^3 > (from the proof of Lemma |2]i, if 
we choose (j> = mm{9i, O2, 0^), then the property|2]is satisfied 
for the case x = n. The case .t < n is almost identical; we 
omit the details for the sake of brevity. 

Appendix F 
Proof of Theorem[3] 

The proof is almost identical to that of Theorem 5 in |[T4l . 
In particular. Lemma |3] shows that the maximum queue-length 
in the system decreases by at least 1 (provided it is nonzero 
to begin with) over a constant number of timeslots, with 
probability at least 1/2. Lemma |2] shows that in a given 
timeslot, it increases by at most AI, and the probability of 
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this increase it at most e~"^ for some C = ct, p) > 0, 
for n large. Using the same stationary distribution bounding 
techniques as those in the proof of Theorem 5 in llT4l . we 
conclude that 



implying the desired result because p < 1 is arbitrary (for- 
mally, taking the limit of both sides as p — s- 1). 



The steps [T] and |6] of the BA algorithm can be performed in 
O(n^) computations each. The steps|2]and|4]can be performed 
in 0{n) computations each. The step |3] can be performed in 
0(1) computations. 

Step |5] requires finding largest cardinality matchings in 
bipartite graphs. Given a bipartite graph with 0{n) nodes, 
the largest cardinality matching can be found in 0{n^^^) 
computations ifTSl . In our case, we need to find largest car- 
dinality matchings in bipartite graphs with 2ni, 2n2, ■ ■ ■ , 2n^ 
nodes respectively with ni + n2 + ■ ■ ■ + n^i, = n. Hence the 

computational effort is 0{n\-^ +n2^ H K'^^'') = 0{n^-^). 

Thus, the BA algorithm can be implemented in 0{'n?'^) 
computations per timeslot. 



Consider the following event that leads to overflow: fix 
6 {0,M/K - 1), and for to = timeslots up to and 

including the timeslot 0, the total number of arrivals to all 
the queues have an empirical mean > nK{l + 6). That is, if 
hit) = iE"=ilMjW = then for -t^ < t < 0, we 
have X^i^o — -^(^ + Since the system can serve at 

most nK packets in a given timeslot, this event leads to an 
overflow at the end of timeslot under any algorithm. 
Analyzing the probability of the event that leads to 
overflow: Fix any p G (0,1). By Sanov's theorem ( |fT3l , 
Thm. 2.1.10), for any timeslot t, the probability of the em- 
pirical mean of the arrivals exceeding A' (1 + 6) is at least 
g-npm fQj. ^ j^j.gg^ sjjj^g Q ^ M/K-1, the set A^/+i \Ce 

is nonempty: [0, 0, . . . , 0, 1] G Am+i \ Cg. Hence, by the 
usual arguments of compactness and lower semicontinuity, 
the infimum in the definition of ^(•) is achieved and is finite 
and strictly positive. By the independence of arrivals across 
timeslots, the probability of overflow event is thus at least 
g-npto?(9)^ implying (because p < 1 is arbitrary) 



lim inf — log P I max Qi {t) > b 





Appendix G 
Proof of Theorem|4] 



Appendix H 
Proof of Theorem[5] 




